Search Result

Select

Text keyword extraction method based on word frequency statistics

LUO Yan, ZHAO Shuliang, LI Xiaochao, HAN Yuhui, DING Yafei

Journal of Computer Applications 2016, 36 (3): 718-725. DOI: 10.11772/j.issn.1001-9081.2016.03.718

Abstract （1277）

PDF （1022KB）（962）

Save

Focused on low efficiency and poor accuracy of the traditional TF-IDF (Term Frequency-Inverse Document Frequency) algorithm in keyword extraction, a text keyword extraction method based on word frequency statistics was proposed. Firstly, the formula of the same frequency words in text was deduced according to Zipf's law; secondly, the proportion of each frequency word in text was determined in accordance with the formula of the same frequency words, most of which were low-frequency words; finally, the TF-IDF algorithm based on word frequency statistics was proposed by applying the word frequency statistics law to keyword extraction. Simulation experiments were conducted on Chinese and English text experiment data sets. The average relative error of the formula of the same frequency words was not more than 0.05; the maximum absolute error of the proportion of each frequency word in text was 0.04. Compared with the traditional TF-IDF algorithm, the average precision, the average recall and the average F1-measure of the TF-IDF algorithm based on word frequency statistics were increased respectively, while the average runtime was decreased. The simulation results show that in text keyword extraction, the TF-IDF algorithm based on word frequency statistics is superior to the traditional TF-IDF algorithm in precision, recall and F1-measure, and it can effectively reduce the runtime in keyword extraction.

Reference | Related Articles | Metrics

Select

Relationships retrospect algorithm on kinship network

GUO Ruiqiang YAN Shaohui ZHAO Shuliang SHEN Yufeng

Journal of Computer Applications 2014, 34 (7): 1988-1991. DOI: 10.11772/j.issn.1001-9081.2014.07.1988

Abstract （210）

PDF （652KB）（602）

Save

Kinship network is made up of marriage and parent-child relationship. Searching a special relationship on a huge kinship network is very difficult. This paper proposed two algorithms by extending breadth-first-search method: radius-search and directional-search. The data of the kinship network was extracted from Hebei province population database, which included about 4150000 vertexes, and about 10880000 edges. The network stored bilateral relationships, which declined some unnecessary back tracking. The experimental results show that the kinship retrospect algorithm can exactly locate some specific persons by the network. At the same time the algorithms can achieve high performance and guarantee high flexibility.

Reference | Related Articles | Metrics

Select

Visualization of multi-valued attribute association rules based on concept lattice

GUO Xiaobo ZHAO Shuliang ZHAO Jiaojiao LIU Jundan

Journal of Computer Applications 2013, 33 (08): 2198-2203.

Abstract （792）

PDF （1159KB）（477）

Save

Considering the problems caused by the traditional association rules visualization approaches, including being unable to display the frequent pattern and relationships of items, unitary express, especially being not conducive to represent multi-schema association rules, a new visualizing algorithm for multi-valued association rules mining was proposed. It introduced the redefinition and classification of multi-valued attribute data by using conceptual lattice and presented the multi-valued attribute items of frequent itemset and association rules with concept lattice structure. This methodology was able to achieve frequent itemset visualization and multi-schema visualization of association rules, including the type of one to one, one to many, many to one, many to many and concept hierarchy. At last, the advantages of these new methods were illustrated with the help of experimental data obtained from demographic data of a province, and the source data visualization, frequent pattern and association relation visual representation of the demographic data were also achieved. The practical application analysis and experimental results prove that the schema has more excellent visual effects for frequent itemset display and authentical multi-schema association rules visualization.

Reference | Related Articles | Metrics

Select

Metagraph for genealogical relationship visualization

LIU Jundan ZHAO Shuliang ZHAO Jiaojiao GUO Xiaobo CHEN Min LIU Mengmeng

Journal of Computer Applications 2013, 33 (07): 2037-2040. DOI: 10.11772/j.issn.1001-9081.2013.07.2037

Abstract （785）

PDF （657KB）（509）

Save

For the poor readability and understandability with the existing display form for genealogical data, this paper presented visualization for genealogical data with metagraph. In the metagraph representation of genealogy, the generating set comprised of all persons in the family; each edge represented only "parents-child" relationship. An edge in the metagraph representation of genealogy was a pair consisting of an invertex and an outvertex, the invertex consisted of two nodes of the marital relationship, and the outvertex represented a single child node set. The experimental results show that the number of the edges in the metagraph form is almost half of common form in the case of the same data, and the visualizing effect is significantly improved. At the same time, the proposed methodology has a guiding role in the mathematical modeling of genealogy, the research of genealogy visualization and the improvement of genealogical information system.

Reference | Related Articles | Metrics